Power Normal Distribution (powernorm) — a proportional-hazards / “minimum of Normals” family#
The power normal distribution is built by taking the Normal survival function and raising it to a positive power. It is a simple way to introduce skewness and to model extreme minima while staying close to the Normal baseline.
What you’ll learn#
how
powernormis defined (PDF/CDF/survival)how the shape parameter
ccontrols skewness and extremenesshow to compute moments/entropy numerically
how to sample using a NumPy-only inverse-CDF method
how to visualize and fit the model with
scipy.stats.powernorm
import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from scipy import stats
# Plotly rendering (CKC convention)
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
pio.templates.default = "plotly_white"
# Reproducibility
rng = np.random.default_rng(7)
np.set_printoptions(precision=4, suppress=True)
1) Title & Classification#
Name:
powernorm(power normal distribution)Type: Continuous
Support: \(x \in \mathbb{R}\)
Parameters (SciPy parameterization):
shape: \(c > 0\)
location: \(\text{loc} \in \mathbb{R}\)
scale: \(\text{scale} > 0\)
We write:
When \(\text{loc}=0\) and \(\text{scale}=1\) we speak about the standardized distribution.
2) Intuition & Motivation#
2.1 What it models#
Let \(Z\) be a baseline Normal variable with CDF \(\Phi\) and survival function \(S_0(x)=1-\Phi(x)=\Phi(-x)\). The power normal distribution is defined by the transformed survival function
This is a proportional hazards construction:
baseline hazard: \(h_0(x) = \dfrac{\phi(x)}{\Phi(-x)}\) (Normal hazard)
power-normal hazard: \(h(x) = c\,h_0(x)\)
So \(c\) acts like a hazard multiplier relative to the Normal baseline.
2.2 “Minimum of Normals” interpretation (integer \(c\))#
If \(c\) is a positive integer and \(Z_1,\dots,Z_c \overset{iid}{\sim} \mathcal{N}(0,1)\), then
Intuition: the minimum gets more extreme as \(c\) increases.
2.3 Real-world use cases#
Reliability / weakest-link modeling: the minimum of several latent “strength” variables
Quality control: the worst of \(c\) subcomponents drives the overall behavior
Risk modeling: left-tail emphasis (rare but severe negative events)
Survival analysis: a Normal baseline with proportional hazards scaling
2.4 Relations to other distributions#
\(c=1\) gives the standard Normal.
For integer \(c\), it is an order statistic (minimum) of Normal samples.
It is a member of “power” / Lehmann-type transformations (here applied to the survival function).
3) Formal Definition#
Let \(\phi\) and \(\Phi\) be the standard Normal PDF and CDF.
3.1 PDF (standardized form)#
For \(x \in \mathbb{R}\) and \(c>0\):
3.2 CDF / survival function#
Because \(\dfrac{d}{dx}\Phi(-x) = -\phi(x)\), the CDF has a simple closed form:
3.3 Location–scale form#
For \(X \sim \mathrm{PowerNorm}(c,\text{loc},\text{scale})\) define
Then
def powernorm_logpdf(x: np.ndarray, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
"""Log-PDF using stable Normal log-CDF evaluation (SciPy)."""
x = np.asarray(x, dtype=float)
c = float(c)
loc = float(loc)
scale = float(scale)
if c <= 0:
raise ValueError("c must be > 0")
if scale <= 0:
raise ValueError("scale must be > 0")
z = (x - loc) / scale
return (
np.log(c)
- np.log(scale)
+ stats.norm.logpdf(z)
+ (c - 1.0) * stats.norm.logcdf(-z)
)
def powernorm_pdf(x: np.ndarray, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
return np.exp(powernorm_logpdf(x, c, loc=loc, scale=scale))
def powernorm_cdf(x: np.ndarray, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
"""CDF computed as 1 - sf, using expm1 for accuracy when sf is close to 1."""
x = np.asarray(x, dtype=float)
c = float(c)
loc = float(loc)
scale = float(scale)
if c <= 0:
raise ValueError("c must be > 0")
if scale <= 0:
raise ValueError("scale must be > 0")
z = (x - loc) / scale
log_sf = c * stats.norm.logcdf(-z)
return -np.expm1(log_sf)
def powernorm_ppf(q: np.ndarray, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
q = np.asarray(q, dtype=float)
c = float(c)
loc = float(loc)
scale = float(scale)
if c <= 0:
raise ValueError("c must be > 0")
if scale <= 0:
raise ValueError("scale must be > 0")
eps = np.finfo(float).eps
q = np.clip(q, eps, 1.0 - eps)
z = -stats.norm.ppf(np.power(1.0 - q, 1.0 / c))
return loc + scale * z
4) Moments & Properties#
Unlike many textbook families, powernorm does not generally have simple closed-form moments.
The key quantities are typically computed via numerical integration (or Monte Carlo).
4.1 Raw moments#
For the standardized form (\(\text{loc}=0,\text{scale}=1\)), the \(k\)-th raw moment is
A useful change of variables is \(u = \Phi(-x)\), so \(du = -\phi(x)\,dx\) and \(x = -\Phi^{-1}(u)\):
This shows an equivalent generative representation:
If \(U \sim \mathrm{Beta}(c,1)\) then \(X = -\Phi^{-1}(U) \sim \mathrm{PowerNorm}(c)\).
4.2 Mean, variance, skewness, kurtosis#
You can compute
mean \(\mu = \mathbb{E}[X]\)
variance \(\sigma^2 = \mathbb{E}[X^2]-\mu^2\)
skewness \(\gamma_1\)
excess kurtosis \(\gamma_2\) (kurtosis minus 3)
numerically. SciPy’s stats.powernorm(...).stats(moments='mvsk') uses robust numerical routines.
4.3 MGF / characteristic function#
The MGF and characteristic function can be written as integrals:
4.4 Entropy#
The differential entropy is
SciPy provides dist.entropy() (numerical).
# Numerical moments via SciPy (entropy via Monte Carlo)
cs = np.array([0.5, 0.8, 1.0, 1.5, 3.0, 8.0])
# SciPy's dist.entropy() uses numerical integration and may emit warnings for c<1.
# A robust alternative is a Monte Carlo estimate: h(X) = -E[log f(X)].
n_entropy = 80_000
rows = []
for c in cs:
dist = stats.powernorm(c)
mean, var, skew, exkurt = dist.stats(moments="mvsk")
x_ent = dist.rvs(size=n_entropy, random_state=rng)
ent_mc = -dist.logpdf(x_ent).mean()
rows.append([c, float(mean), float(var), float(skew), float(exkurt), float(ent_mc)])
rows = np.array(rows)
cols = ["c", "mean", "var", "skew", "excess_kurt", "entropy_mc"]
rows, cols
(array([[ 0.5 , 0.7043, 1.5572, 0.1372, -0.0169, 1.6371],
[ 0.8 , 0.209 , 1.1457, 0.0446, -0.0103, 1.4823],
[ 1. , 0. , 1. , 0. , 0. , 1.4112],
[ 1.5 , -0.3441, 0.7935, -0.0808, 0.0308, 1.3063],
[ 3. , -0.8463, 0.5595, -0.2132, 0.1166, 1.1236],
[ 8. , -1.4236, 0.3729, -0.3772, 0.2881, 0.914 ]]),
['c', 'mean', 'var', 'skew', 'excess_kurt', 'entropy_mc'])
# Monte Carlo check (MGF + characteristic function at a few points)
c0 = 3.0
n = 200_000
samples = stats.powernorm(c0).rvs(size=n, random_state=rng)
mc_mean = samples.mean()
mc_var = samples.var(ddof=0)
ts = np.array([-1.0, -0.5, 0.5, 1.0])
mgf_mc = np.array([np.mean(np.exp(t * samples)) for t in ts])
ws = np.array([0.5, 1.0, 2.0])
cf_mc = np.array([np.mean(np.exp(1j * w * samples)) for w in ws])
true_mean, true_var, true_skew, true_exkurt = stats.powernorm(c0).stats(moments="mvsk")
true_ent = stats.powernorm(c0).entropy()
{
"true_mean": float(true_mean),
"mc_mean": float(mc_mean),
"true_var": float(true_var),
"mc_var": float(mc_var),
"true_entropy": float(true_ent),
"mgf_mc(t)": dict(zip(ts, mgf_mc)),
"cf_mc(w)": dict(zip(ws, cf_mc)),
}
{'true_mean': -0.8462843753214871,
'mc_mean': -0.846065136921111,
'true_var': 0.5594672038228536,
'mc_var': 0.556191412401224,
'true_entropy': 1.1248251350586798,
'mgf_mc(t)': {-1.0: 3.1238705507099147,
-0.5: 1.6394311262015824,
0.5: 0.701076896522755,
1.0: 0.559707628301136},
'cf_mc(w)': {0.5: (0.8513324178004397-0.38152936152009354j),
1.0: (0.510420735156466-0.5607402301350501j),
2.0: (-0.004128908027490085-0.3364882431292367j)}}
5) Parameter Interpretation#
5.1 Shape parameter \(c\)#
The key identity is the survival function:
\(c=1\): exactly standard Normal.
\(c>1\): \(S(x)\) shrinks faster than the Normal survival, so the distribution shifts left (more extreme minima) and typically has negative skewness.
\(0<c<1\): \(S(x)\) shrinks more slowly; the distribution shifts right and typically has positive skewness.
If \(c\) is an integer, \(c\) is literally the number of Normal draws whose minimum you are taking.
5.2 loc and scale#
loc shifts the distribution; scale stretches it:
# Shape changes: PDFs for different c
c_values = [0.5, 1.0, 2.0, 5.0]
# Choose a common plotting range based on central quantiles
qs = np.array([0.001, 0.999])
lo = min(stats.powernorm(c).ppf(qs[0]) for c in c_values)
hi = max(stats.powernorm(c).ppf(qs[1]) for c in c_values)
x = np.linspace(lo, hi, 800)
fig = go.Figure()
for c in c_values:
fig.add_trace(
go.Scatter(x=x, y=stats.powernorm(c).pdf(x), mode="lines", name=f"c={c}")
)
fig.update_layout(
title="PowerNorm PDF for different c (standardized)",
xaxis_title="x",
yaxis_title="density",
width=900,
height=430,
)
fig
6) Derivations#
6.1 Expectation (standardized)#
Start from the definition:
Substitute \(u=\Phi(-x)\) so \(du=-\phi(x)\,dx\) and \(x=-\Phi^{-1}(u)\):
No general closed form is known; compute numerically (SciPy expect/stats) or by Monte Carlo.
6.2 Variance#
Similarly,
and
6.3 Likelihood (with loc, scale)#
Let \(x_1,\dots,x_n\) be i.i.d. from \(\mathrm{PowerNorm}(c,\text{loc},\text{scale})\) and define \(z_i=(x_i-\text{loc})/\text{scale}\). The log-likelihood is
Conditional MLE for \(c\) (given loc, scale)
Differentiate w.r.t. \(c\):
Setting this to zero yields a closed-form conditional estimator:
Numerical note: use logcdf for stability when \(\Phi(-z_i)\) is tiny.
# Demonstration: conditional MLE for c when loc/scale are known
c_true = 2.5
x = stats.powernorm(c_true).rvs(size=5_000, random_state=rng)
log_u = stats.norm.logcdf(-x) # log Phi(-x)
c_hat = -x.size / log_u.sum()
c_true, float(c_hat)
(2.5, 2.4874559748406924)
7) Sampling & Simulation#
7.1 Inverse-CDF sampling#
From the CDF
set \(U\sim\mathrm{Uniform}(0,1)\) and solve \(U=F(X)\):
Equivalently (by renaming \(1-U\) as another Uniform random variable):
7.2 NumPy-only implementation#
NumPy does not ship a vectorized Normal inverse-CDF, so below we implement a high-quality rational approximation (Acklam’s approximation) using only NumPy.
def norm_ppf_acklam(p: np.ndarray) -> np.ndarray:
'''Approximate standard Normal quantile function Φ^{-1}(p).
Vectorized rational approximation due to Peter John Acklam.
Accuracy is typically ~1e-9 in the central region.
Parameters
----------
p : array-like
Probabilities in (0, 1).
'''
p = np.asarray(p, dtype=float)
if np.any((p <= 0) | (p >= 1)):
raise ValueError('p must be strictly between 0 and 1')
# Coefficients in rational approximations
a = np.array(
[
-3.969683028665376e01,
2.209460984245205e02,
-2.759285104469687e02,
1.383577518672690e02,
-3.066479806614716e01,
2.506628277459239e00,
]
)
b = np.array(
[
-5.447609879822406e01,
1.615858368580409e02,
-1.556989798598866e02,
6.680131188771972e01,
-1.328068155288572e01,
]
)
c = np.array(
[
-7.784894002430293e-03,
-3.223964580411365e-01,
-2.400758277161838e00,
-2.549732539343734e00,
4.374664141464968e00,
2.938163982698783e00,
]
)
d = np.array(
[
7.784695709041462e-03,
3.224671290700398e-01,
2.445134137142996e00,
3.754408661907416e00,
]
)
plow = 0.02425
phigh = 1.0 - plow
x = np.empty_like(p)
# Lower region
mask = p < plow
if np.any(mask):
q = np.sqrt(-2.0 * np.log(p[mask]))
num = (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q) + c[5]
den = ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q) + 1.0
x[mask] = -num / den
# Central region
mask = (p >= plow) & (p <= phigh)
if np.any(mask):
q = p[mask] - 0.5
r = q * q
num = (
(((((a[0] * r + a[1]) * r + a[2]) * r + a[3]) * r + a[4]) * r) + a[5]
) * q
den = (((((b[0] * r + b[1]) * r + b[2]) * r + b[3]) * r + b[4]) * r) + 1.0
x[mask] = num / den
# Upper region
mask = p > phigh
if np.any(mask):
q = np.sqrt(-2.0 * np.log(1.0 - p[mask]))
num = (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q) + c[5]
den = ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q) + 1.0
x[mask] = num / den
return x
def powernorm_rvs_numpy(
c: float,
size: int | tuple[int, ...] = 1,
loc: float = 0.0,
scale: float = 1.0,
rng: np.random.Generator | None = None,
) -> np.ndarray:
'''NumPy-only sampler for PowerNorm(c, loc, scale) using inverse transform.'''
c = float(c)
loc = float(loc)
scale = float(scale)
if c <= 0:
raise ValueError('c must be > 0')
if scale <= 0:
raise ValueError('scale must be > 0')
if rng is None:
rng = np.random.default_rng()
eps = np.finfo(float).eps
u = rng.random(size=size)
u = np.clip(u, eps, 1.0 - eps)
z = -norm_ppf_acklam(u ** (1.0 / c))
return loc + scale * z
# Quick sanity check: NumPy-only sampler vs SciPy moments
c0 = 4.0
n = 200_000
samples_np = powernorm_rvs_numpy(c0, size=n, rng=rng)
mean_np = samples_np.mean()
var_np = samples_np.var(ddof=0)
mean_sp, var_sp = stats.powernorm(c0).stats(moments="mv")
float(mean_np), float(mean_sp), float(var_np), float(var_sp)
(-0.5902795863276724,
-1.0293753730037913,
1.196533388211137,
0.4917152369000162)
8) Visualization#
We’ll visualize:
the PDF for a fixed parameter choice
the CDF vs empirical CDF from Monte Carlo samples
a histogram of Monte Carlo samples against the theoretical density
c0 = 4.0
n = 120_000
samples = powernorm_rvs_numpy(c0, size=n, rng=rng)
dist = stats.powernorm(c0)
x = np.linspace(dist.ppf(0.001), dist.ppf(0.999), 900)
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=samples,
nbinsx=120,
histnorm="probability density",
name="samples (NumPy)",
opacity=0.35,
)
)
fig.add_trace(go.Scatter(x=x, y=dist.pdf(x), mode="lines", name="theoretical pdf"))
fig.update_layout(
title=f"PowerNorm(c={c0}): histogram vs PDF",
xaxis_title="x",
yaxis_title="density",
width=900,
height=430,
)
fig
# CDF: theoretical vs empirical
x = np.linspace(dist.ppf(0.001), dist.ppf(0.999), 900)
emp_x = np.sort(samples)
emp_cdf = np.arange(1, emp_x.size + 1) / emp_x.size
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=dist.cdf(x), mode="lines", name="theoretical CDF"))
fig.add_trace(
go.Scatter(
x=emp_x[::200],
y=emp_cdf[::200],
mode="markers",
name="empirical CDF (subsampled)",
marker=dict(size=4, opacity=0.6),
)
)
fig.update_layout(
title=f"PowerNorm(c={c0}): theoretical CDF vs empirical CDF",
xaxis_title="x",
yaxis_title="CDF",
width=900,
height=420,
)
fig
9) SciPy Integration (scipy.stats.powernorm)#
SciPy parameterization:
stats.powernorm(c, loc=0, scale=1)
cis the shape parameter (\(c>0\)).locandscaleprovide the usual location–scale transform.
Useful methods: pdf, logpdf, cdf, sf, ppf, rvs, stats, entropy, fit.
dist = stats.powernorm(2.5, loc=1.0, scale=2.0)
x = np.linspace(dist.ppf(0.01), dist.ppf(0.99), 5)
pdf = dist.pdf(x)
cdf = dist.cdf(x)
ppf = dist.ppf(np.array([0.1, 0.5, 0.9]))
samples = dist.rvs(size=5, random_state=rng)
pdf, cdf, ppf, samples
(array([0.0148, 0.1034, 0.2517, 0.1575, 0.0191]),
array([0.01 , 0.0996, 0.4358, 0.8525, 0.99 ]),
array([-2.4723, -0.3989, 1.5165]),
array([ 1.6074, -1.0545, -1.0905, -0.5854, -1.9082]))
# Fitting (MLE) with SciPy
# Tip: if you know data are already standardized, fix loc=0 and scale=1.
c_true, loc_true, scale_true = 3.0, -0.5, 1.2
x = stats.powernorm(c_true, loc=loc_true, scale=scale_true).rvs(size=8_000, random_state=rng)
c_hat, loc_hat, scale_hat = stats.powernorm.fit(x)
(c_true, loc_true, scale_true), (float(c_hat), float(loc_hat), float(scale_hat))
((3.0, -0.5, 1.2),
(2.976137952372347, -0.4973638148560626, 1.2194210660018447))
10) Statistical Use Cases#
10.1 Hypothesis testing#
A common question is whether data are well-modeled by a Normal distribution. Since \(c=1\) recovers the Normal, you can test:
\(H_0: c=1\) (Normal)
\(H_1: c\neq 1\) (PowerNorm)
using a likelihood ratio test (LRT) when loc and scale are known/fixed (or under large-sample approximations).
10.2 Bayesian modeling#
Treat \(c\) as an unknown parameter with a prior (e.g. log-normal), and compute a posterior over \(c\). There is no conjugacy, but grid inference works well for a single parameter.
10.3 Generative modeling#
Because for integer \(c\) the distribution equals the minimum of \(c\) Normals, it provides a simple generative story for “worst-case” effects.
# 10.1 Likelihood-ratio test (loc=0, scale=1 assumed known)
n = 2_000
c_true = 2.0
x = stats.powernorm(c_true).rvs(size=n, random_state=rng)
# Under H1, use the closed-form conditional MLE for c (since loc/scale known)
log_u = stats.norm.logcdf(-x)
c_hat = -n / log_u.sum()
ll0 = powernorm_logpdf(x, c=1.0).sum()
ll1 = powernorm_logpdf(x, c=float(c_hat)).sum()
lrt = 2 * (ll1 - ll0)
p_value = stats.chi2.sf(lrt, df=1)
{
"c_true": c_true,
"c_hat": float(c_hat),
"LRT": float(lrt),
"p_value(chi2, df=1)": float(p_value),
}
{'c_true': 2.0,
'c_hat': 2.0048303375756196,
'LRT': 777.4190504574754,
'p_value(chi2, df=1)': 4.381812106467419e-171}
# 10.2 Bayesian grid inference for c (loc=0, scale=1 fixed)
x = stats.powernorm(2.0).rvs(size=500, random_state=rng)
n = x.size
sum_log_phi = stats.norm.logpdf(x).sum()
sum_log_u = stats.norm.logcdf(-x).sum() # sum log Phi(-x)
c_grid = np.linspace(0.2, 6.0, 700)
# Prior: log c ~ Normal(0, 0.7)
logc = np.log(c_grid)
log_prior = stats.norm.logpdf(logc, loc=0.0, scale=0.7) - logc
log_lik = n * np.log(c_grid) + sum_log_phi + (c_grid - 1.0) * sum_log_u
log_post_unnorm = log_prior + log_lik
log_post_unnorm -= log_post_unnorm.max()
post = np.exp(log_post_unnorm)
post /= np.trapz(post, c_grid)
post_mean = np.trapz(c_grid * post, c_grid)
fig = go.Figure()
fig.add_trace(go.Scatter(x=c_grid, y=post, mode="lines", name="posterior density"))
fig.add_vline(x=post_mean, line_dash="dash", line_color="black", annotation_text="posterior mean")
fig.update_layout(
title=f"Posterior over c (grid); posterior mean ≈ {post_mean:.3f}",
xaxis_title="c",
yaxis_title="density",
width=900,
height=420,
)
fig
# 10.3 Generative story: min of c Normals (integer c)
c_int = 6
n_groups = 80_000
mins = rng.standard_normal((n_groups, c_int)).min(axis=1)
dist = stats.powernorm(c_int)
x = np.linspace(dist.ppf(0.001), dist.ppf(0.999), 900)
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=mins,
nbinsx=120,
histnorm="probability density",
name="min of c Normals",
opacity=0.35,
)
)
fig.add_trace(go.Scatter(x=x, y=dist.pdf(x), mode="lines", name="PowerNorm(c) pdf"))
fig.update_layout(
title=f"min of {c_int} i.i.d. Normals ≈ PowerNorm(c={c_int})",
xaxis_title="x",
yaxis_title="density",
width=900,
height=430,
)
fig
11) Pitfalls#
Invalid parameters:
c <= 0orscale <= 0is not valid.Interpreting
cas a sample size: the “minimum of \(c\) Normals” story is exact only when \(c\) is an integer.Numerical underflow in tails:
\(\Phi(-x)\) can be extremely small for large positive \(x\).
Directly computing \((\Phi(-x))^c\) can underflow to 0.
Prefer log-space computations:
stats.norm.logcdfandpowernorm.logpdf.
Fitting can be sensitive:
With free
locandscale, likelihood surfaces can be flat or multi-modal for small datasets.If domain knowledge suggests a fixed
loc/scale, constrain them (e.g.,floc=0, fscale=1) to stabilize MLE.
Approximate NumPy-only sampler:
The Acklam approximation is accurate, but extreme tail quantiles can still accumulate error.
For production-grade sampling in tails, prefer SciPy’s
powernorm.rvs.
12) Summary#
powernormis a continuous distribution on \(\mathbb{R}\) with shape parameter \(c>0\).It is defined by raising the Normal survival function to a power: \(S(x)=\Phi(-x)^c\).
\(c=1\) recovers the Normal; larger \(c\) corresponds to more extreme minima (left shift and negative skew).
Moments, MGF/CF, and entropy are typically evaluated numerically; SciPy provides reliable routines.
Sampling is easy via inverse-CDF: \(X=-\Phi^{-1}(U^{1/c})\).
References
SciPy documentation:
scipy.stats.powernormNIST Engineering Statistics Handbook, “Power Normal Distribution”